Documentation
LiveSPInstallation Operating GuideDownload PDF
Monitoring : Services : Check 2 – No high or medium service has recently restarted
Check 2 – No high or medium service has recently restarted
Process
Command: livesp-stability
API Endpoint: /api/v1/service/stability
Status: equals “OK” if all services are running since at least 2 days else “KO”.
Severity: equals “NONE” if all services are running since at least 2 days else equals the highest criticality of the services that are not running since at least 2 days.
Messages: details for each service that is not running since at least 2 days (criticality part is detailed in the above paragraph).
Support action
1. Check whether a manual operation occurred in last 2 days (if the service was manually killed, a LiveSP installation or upgrade occurred, or the server was rebooted for example, then this is normal).
2. What was the system message when it failed?
Run command dksps
3. Was there a docker heartbeat issue (network communication issue with the manager) on the node of this service? (requires sudo rights)
Go to hosting server with ssh $(getSwarmContainerNodeIP )
Run command sudo journalctl -u docker --since "2 days ago" | grep 'hearbeat'
4. Check the applicative logs of the service before the service restarted
Look for logs in /data/logs or in the ELK user interface
Example
$ livesp-stability
{
  "name": "/api/v1/service/stability",
  "timestamp": "2020-08-05T14:15:54Z",
  "status": "KO",
  "severity": "HIGH",
  "messages": [
    "KO - livesp_bach - TaskName: livesp_bach.1 - State: Running 29 hours ago - Criticity: HIGH"
  ]
}
$ dksps livesp_bach
ID                              NAME                NODE     CURRENT STATE  ERROR
q2phpr7ng5fg167435zor6uii  livesp_bach.1      bonite  Running 29 hours ago
q5txlvhq98fggde0uejlwog8d  \_ livesp_bach.1  bonite  Shutdown 29 hours ago
tbvc3mjcul5rmy7h89e45pthi  \_ livesp_bach.1  bonite  Shutdown 2 days ago
z3nu29btw3fwcl2bnsclauv78  \_ livesp_bach.1  bonite  Shutdown 2 days ago
tr8z8whv8ht6mcoa5q4z80zws  \_ livesp_bach.1  bonite  Failed 2 days ago "task: non-zero exit (137)"